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We consider the problem of reliably finding filaments in point 
clouds. Realistic data sets often have numerous filaments of various 
sizes and shapes. Statistical techniques exist for finding one (or a 
few) filaments but these methods do not handle noisy data sets with 
many filaments. Other methods can be found in the astronomy litera- 
ture but they do not have rigorous statistical guarantees. We propose 
the following method. Starting at each data point we construct the 
steepest ascent path along a kernel density estimator. We locate fila- 
ments by finding regions where these paths are highly concentrated. 
Formally, we define the density of these paths and we construct a 
consistent estimator of this path density. 

1. Introduction. The motivation for this paper stems from the problem 
of finding filaments from point process data. Filaments are one-dimensional 
curves embedded in a point process or random field. Identifying filamen- 
tary structures is an important problem in many applications. In medical 
imaging, filaments arise as networks of blood vessels in tissue and need to 
be identified and mapped. In remote sensing, river systems and road net- 
works are common filamentary structures of critical importance [Lacoste, 
Descombes and Zerubia (2005) and Stoica, Descombes and Zerubia (2004)]. 
In seismology, the concentration of earthquake epicenters traces the fila- 
mentary network of fault lines. This paper is motivated by a cosmological 
application: the detection of filaments of matter in the universe. 

Figures 1 and 2 show two examples. The first example is a cosmology 
data set showing positions of galaxies. The second is a synthetic example. 
In each case, the upper left plot shows the data, which exhibits an apparent 
filamentary structure. A density estimate reveals this structure more clearly. 
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Fig. 1. Cosmological data: data set (plot A), steepest ascent paths of all data points (plot 
B), paths after cutting the first five iterations (plot C) and level set at 90% quantile of the 
estimated path density (plot D ). 

In particular, the steepest ascent paths of the density estimate — the paths 
from each point that follows the gradient — tend to concentrate along the 
filaments. The upper right plot in each figure shows the collection of paths, 
and the lower left plot trims off the early iterations. The result, on the bot- 
tom right in the figures, shows a high density of paths around the filaments. 
The empirical observation that the steepest ascent paths concentrate around 
the filaments motivates our approach in this paper. Specifically, we charac- 
terize the density of steepest ascent path — what we call the path density 
below — and construct a consistent estimator of the path density using the 
steepest ascent paths of a density estimator. Our purpose in this paper is 
to explore this idea in detail. We apply this idea to the filament problem by 
showing that the path density is high near filaments and that the set where 
the estimated path density is large essentially capture the filaments. 
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Fig. 2. Simulated example: data set (plot A), steepest ascent paths of all data points 
(plot B), paths after cutting the first three iterations (plot C) and level set at 90% quantile 
of the estimated path density (plot D ). 

There are existing techniques for finding one (or a few) filaments. Exam- 
ples from statistics and machine learning include Arias-Castro, Donoho and 
Huo (2006), Kegl et al. (2000), Sandilya and Kulkarni (2002) and Tibshi- 
rani (1992). But none of these methods are practical for noisy data sets with 
large numbers of complicated filaments. Other methods can be found in the 
astronomy literature: see Stoica et al. (2005), Eriksen et al. (2004), Novikov, 
Colombi and Dore (2006), Sousbie et al. (2006) and Barrow, Bhavsar and 
Sonoda (1985). But these methods do not have rigorous statistical guaran- 
tees. Thus the problem of reliably finding many filaments simultaneously 
remains largely unsolved. Hence our search for new methods. We would like 
to emphasize that the methods proposed here are still very far from being a 
complete methodology for filament analysis. Rather our results are a mod- 
est first step toward a novel and promising methdology. A recent paper on 
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estimating integral curves is Koltchinskii (2007), although the setting and 
methodology are quite different. 

Let us now give a heuristic description of the basic problem we study. 
Let Xi , . . . , X n be a sample from a distribution fix with density gx ■ We 
estimate the function 

, ¥(P(X) intersects B (x,r)) 

v(x) = hm , 

v ' r-*0 r 

where P(X) is the steepest ascent path defined by gx starting at X and 
B(x,r) = {y : \\y — x\\ < r}. We use balls here because they are simple, but 
any suitably rich collection of open neighborhoods yields the same function. 

Our main results ell clS follows. Theorem 6 explicitly characterizes the 
path density p(x). Theorem 3 constructs an estimator p n such that 

where h n and v n are appropriately chosen bandwidth parameters. We then 
apply these ideas to the problem of identifying filaments from point-process 
data and show, in Theorem 4, that the level sets of the path density con- 
centrate around the filaments. 



1.1. Notation. We assume a probability space (fi,<S,P) on which all our 
random variables are defined. We consider random variables taking values 
in ]R 2 . Throughout the paper, Uq denotes a large, fixed compact subset of 
R 2 whose properties are specified later. 

Let B(x,r) denote the open ball of radius r centered on x. Denote the 
closure of B(x,r) by B(x,r) and its boundary by dB(x,r). For any set A, 
define the r-dilation by 

(1) B(A,r)= |J B(x,r). 

We use 1a (x) as the indicator function of the set A. 

If f -.W 1 -^-W 71 , then we use Df, D 2 f, Dif to denote various total and 
partial derivatives. Specifically, Df is a linear map from M n — > M m , D 2 f 
denotes the Hessian matrix of /, and Dif is the partial derivative of / 
with respect to the ith argument. When m = 1, Df is a gradient, which is 
convenient to view as a vector in W 1 . For this purpose, we use V/ to denote 
the column vector {Df) T . 

We use (f> to denote the univariate standard normal density and $ the 
corresponding distribution function. For a > 0, (/)& and $ CT are the corre- 
sponding iV(0, a 2 ) functions. In M. d for d > 2, we use tp for the <i-dimensional 
standard normal density and cp a for the N(0, diag(<7 2 , . . . ,a 2 )) density. 
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1.2. Outline. In Section 2, we define and characterize the path density 
function. In Section 3, we define an estimator of the path density function 
and find its rate of convergence. Section 4 describes the problem of iden- 
tifying filaments from point-process data, showing that the path density is 
large near filaments and small elsewhere. Proofs are relegated to Sections 5 
through 7. We close with some general remarks in Section 8. 

2. Integral curves and path densities. 

2.1. Flows and the local group. If V is a smooth (C^ for £ > 1) vector 
field on R 2 , then we might imagine putting a test particle at any i£M 2 and 
letting it flow with velocity given by the vector field. 

It is known [the Picard-Lindelof theorem, Irwin (2001)] that this idea 
is well defined. The paths followed by the test particle starting at differ- 
ent points fit together in a consistent way. In particular, in any neighbor- 
hood U of x £ R 2 , there is a neighborhood U\ C U, an interval IcM con- 
taining and a C"» mapping ip:IxUi—>U such that (i) ip(0,x) = x, (ii) 
-§£ip(t, x) = V(ip(t, x)), and (iii) if s,t,s + t £ I , ip(s + t, x) = ip(s, ip(t, x)). 
For each t, we define a mapping i^x = ip(t,x) that gives the point obtained 
by following the flow from x for time t. In these terms, property (iii) be- 
comes ip s+t =^0^', giving a group-like structure with composition as the 
product. Because of this, the mappings ip* are called the local one parame- 
ter group of diffeomorphisms generated by V . Moreover, the paths t \— ► ip t x, 
called integral curves or local flows of the vector field, are unique in the 
sense that integral curves are equal where their domains overlap. Thus, the 
integral curves create an equivalence relation on R 2 , where two points are 
equivalent if they are on the same integral curve. 

In certain cases, these local flows can be extended to a global flow, a 
mapping i/):KxI 2 -»K 2 satisfying properties (i), (ii) and (iii) above with 
/ = R. The mappings ip t : R 2 — > R 2 are C"> diffeomorphisms that form a group 
under composition. Such global flows exist, for instance, when the domain 
of the vector field is compact, when V has compact support or when the 
domain is a Banach space [see Theorem 3.39 in Irwin (2001)], which is the 
case for R 2 . 

We derive a vector field from the gradient of the density gx- A critical 
point of gx is one where the gradient of gx equals 0. Any other point is 
called a regular point of gx- We make the following assumption in what 
follows. 

Assumption 1. is a C"> +1 function on I 2 for 2 < ( < co. 

Assumption 2. All the critical points of gx are nondegenerate, meaning 
that the Hessian is nonsingular. 
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Assumption 3. 



(2) 



lira 



inf 

B 

diam(_B)= 



sup 

dB 



l|V 5 x|| 



0. 



where the infimum is over closed balls in R 2 of given diameter d and where 
riB is the outward pointing, unit normal vector field defined on dB. 



The importance of Assumption 2 is that the behavior of gx around non- 
degenerate critical points is locally quadratic. See Remark 3 below. Nonde- 
generate critical points are necessarily isolated. Moreover, by Assumption 
3, all gx' s critical points lie in a compact set. These facts together with 
Assumption 2 imply that the critical points of gx are finite in number. 

Assumption 3 ensures that the gradient of gx is nearly radial far enough 
out from some point. This condition is satisfied by a wide variety of common 
distributions whose critical points lie within a compact set. For example, in 
Section 4, we show that the results extend to quite general mixtures of 
normal distributions. A stronger alternative assumption that is easier to 
understand is that gx has compact support, which is certainly sufficient for 
the results that follow. 

From the assumptions above it follows that V = V gx gives a -vector 
field on R 2 and therefore generates a unique global flow tp. Here, the flow 
ij^x moves along the direction of steepest ascent. If the support of gx is not 
necessarily compact, as in Section 4, we can restrict to a compact set Uq 
containing all critical points of gx- Restriction to Uq requires that we define 
an interval I x = [a x ,b x ] such that ^x G Uq whenever t £ I x . If Assumption 
3 holds, the interval is the entire real line, but we maintain the interval 
notation in the proofs to facilitate extension of the results in later sections. 

The global flow ip on R 2 generates an equivalence relation on Uq (or R 2 
for that matter). For x,y £ Uo, say that x ~ y if i^y = x for some t 6 R. 
We say that x precedes y,x^y(x^y),i( ip^y = x for t > (t > 0) and y 
succeeds x, y >z x (y y x), if i^y = x for t > (t > 0). Here precedence and 
succession refer to the flow in directions of increase of gx, that is, along the 
vector field V. For any A C Uo, define the reverse evolution of A under the 
flow by 

(3) V(A)={y£U :y±x,x£A}, 

the set of points in Uo that precede a point in A. With minor abuse of 
notation, we also write V(x) = V({x}). 

A simple example may clarify these definitions. Let g(x) = — i|| x 1 1 2 , a 
simple quadratic. The gradient of g is a vector field that at each point x 
gives the direction of steepest ascent Vg{x) = —x. This vector field specifies 
differential equations at every point x:7(£) = —j(t) with 7(0) = x, where 
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Fig. 3. 



Flow lines in a typical gradient field, moving toward a local maximum. 



7 = (71,72) is a curve, that is, a function from R to R 2 . This is the integral 
curve. The mapping ip packages together all these integral curves, and in 
this example, ip f x = {xie~ l , a^e - *). Figure 3 illustrates the flows in a generic 
case. 

As we will see in Theorem 4, the integral curves of the flow -0 concentrate 
near the filaments. We would thus expect that following the flow induced 
by the gradient of gx will carry a collection of random points near to the 
filaments. This is the idea behind the estimators described in the first section. 
We quantify this concentration through the path measure tt, which we define 
to be the probability that the flow from a random point hits a given set. 
That is, for A C Uq, 

(4) n(A)=n x (y(A)). 

This is a sub-additive set function (in fact, an infinitely-alternating Cho- 
quet capacity), and does not have a density in the Radon-Nikodym sense. 
However, we show below that the following density is well defined. 

Definition f. The path density associated with gx is the function 
p : R 2 —> [0, 00] defined by 

(5) p(z) = Um*W X > T) \ 

r— >o r 

Remark f . One might have expected the denominator in the expression 
above to be r 2 since we are working in R 2 but, as it turns out, n(B(x, r)) is 
order r. 

Remark 2. To clarify, ir is a set function with n(A) giving the prob- 
ability that for a random point drawn from gx, the flow from that point 
hits A. The path density p is a function on R 2 with p(x) describing the 
7r-probability in infinitesimal neighborhoods of x. See Figure 4. 
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Fig. 4. A set A represented by the white disk and the set of points whose flow lines hit 
A, the union of A and the shaded gray region. n(A) is the probability content of the latter 
set. The path density is the limit of these probabilities, suitably normalized, as A shrinks. 

The next theorem gives the main features of the path density. A more 
detailed characterization of p is given by Theorem 6 in Section 6. Let M 
denote the set of local maxima of gx and H denote the set of saddlepoints 
of gx- 

Theorem 1. Under Assumptions 1-3, the path density p is an upper- 
semicontinuous function with the following properties: 

1. if x is a local minimum of gx , p(%) = 0; 

2. if x is a local maximum of gx, p{x) = oc; 

3. p is continuous on (Ai L)7i) c and bounded on A4 C . 

Remark 3. The result of Theorem 1 depends on the nondegeneracy of 
gx's critical points. While nondegenerate critical points are isolated, degen- 
erate critical points need not be. For example, a density with a linear ridge 
can produce a line of local maxima. A mixture of normals over two closely 
spaced, parallel lines would generate a similar ridge between the lines. In 
such cases, the local behavior of the function around critical points need not 
be local quadratic as it must around nondegenerate critical points. Near a 
ridge, for instance, the flow lines from two neighboring points will proceed 
in parallel toward the ridge rather than converging on a local maximum. As 
such, p is finite at any point on the ridge, although Tr(B(R,r))/r — > oo as 
r — > where R is the set of points on the ridge. 
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The good news is that degeneracy is rare in the sense that functions with- 
out degenerate critical points are dense in the space of smooth functions. The 
proof of this follows from Sard's theorem by adding to gx a linear function 
that is arbitrarily small over Uq\ the linear function changes the locations 
of the critical points without changing the Hessian and almost every linear 
function eliminates the nondegeneracy. [See Guillemin and Pollack (1974) 
for further discussion of this point.] 

The next theorem gives a more precise approximation of the path measure 
in terms of the path density. It will be used extensively in what follows. 

Theorem 2. Let x be a regular point of gx- Then, the following hold: 

1. As r ^ 0, 

(6) ir{B(x,r)) = rp(x) + 0(r 2 ). 

2. There exists a curve -y, parameterized by arc length, such that 

(7) n(B(x,r)) = \ f_p{l{t)) dt + 0(r 2 ). 

3. Estimating the path density. Recall that X\, . . . ,X n are independent 
observations from the distribution fix with density gx- Our goal is to esti- 
mate the path density p. 

First we estimate gx using the kernel estimator 

i=i 

Given an arbitrary point i£R 2 , let P{x) = {ij^x : < t < oo} denote the 
integral curve starting at x and let P(x) denote the integral curve obtained 
by replacing gx with g n . [Recall that ip x is the point obtained by starting 
at x at time and following the vector field until time t. The integral curve 
P(x) shows the whole path going forward in time.] We define the path density 
estimator 

i n i /inf . \\x — z\\ \ 

( 8) e < -^r • 

i=i 

Thus, p n { x ) is essentially a weighted count of how many observed paths 
P(X\), . . . , P(X n ) get close to x. It is not necessary to use the same kernel 
K for g n and p n , but for simplicity we shall take them to be the same up to 
a normalizing constant. 
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Remark 4. A numerical approximation to P{x) can be obtained using 
the mean shift algorithm [Fukunaga and Hostetler (1975) and Cheng (1995)]. 
Given an arbitrary point x £ M 2 , the mean shift algorithm defines a sequence 
s(x) = (x° = x, x , x 2 , . . .), where 

(q) k+i = EUX i K(l/h\\x k -X i \\) 

{ ' " Eum/hWxx-XtW) • 

The sequence s(x) converges to a mode of g n . Conversely, for each mode of 
g n there exists a point x such that s(x) converges to that mode. A smooth 
interpolation of the points in s(x) can be regarded as a numerical approxi- 
mation to P{x). 

We make the following assumptions about the kernel K. 

(Kl) K : [0,oo ) — ► [0, oo) is a nonincreasing, bounded, square integrable 
function and has bounded derivative. 

(K2) K satisfies the Gine and Guillou (2002) conditions, namely, K be- 
longs to the linear span of functions with the following property: the set 
{(s,u) -K(s) > u} can be represented as a finite number of Boolean opera- 
tions among sets of the form {(s,u) :a(s,u) > b(u)} where a is a polynomial 
and b is any real function. 

(K3) K satisfies the following tail condition: as x — > oo 

K(x) = 0(xe~ x ). 

Condition (K2) is somewhat abstract but, unfortunately, such a technical 
condition appears to be necessary. The role (K2) plays in the proof is to 
ensures that the VC-condition holds, thus allowing uniform control of an 
empirical process. The good news, as Gine and Guillou (2002) point out, is 
that (K2) is satisfied by most common kernels. 

Theorem 3. Suppose that K satisfies (K1)-(K3), and that the band- 
widths h n and v n satisfy the following conditions: 

h n nh n I log K\ , 2 . ,2 

hn->v, - — —r^oo, - — >oo, h n <cih 2n , 

| log h n | log log n 

for some c\, and 

nv n | log v n | 

Vn -> 0, r -> OO, > OO, V n < C2V2n-, 

|logz/ n | log log n 

for some C2- Further, assume that gx is bounded. Then, 



SUp \pn(x) - P (X)\ = Ojj^lm + P / l0gre 



(10) 

+ 0(u n ) + 0(h 2 n ), 
where the sup x is taken over all regular points. 



nh n 
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The best rate is obtained by setting h n x (logn) 1 / 4 /^ 1 / 8 and v n x logn/n 



1/3 



Corollary 1. Setting h n x (logn) 1 / 4 /^ 1 / 8 and u n x logn/n 1 / 3 we have 



where the sup^, is taken over all regular points. 

4. Filament detection. Galaxies, being large and bright and having a 
tendency to cluster together, serve as tracers of matter. At large enough 
scales, the universe looks the same in every direction, so astronomers were 
surprised when their maps of the galaxy locations revealed complicated, sys- 
tematic structures — clusters, walls, sheets, and voids. But the most striking 
feature in these maps is the vast network of filaments, often called the "cos- 
mic web." Panel A in Figure 1 shows a map from one such survey, with the 
galaxy locations selected from a two-dimensional slice of the universe. As- 
tronomers want to identify the filaments to help them characterize this large 
scale structure and in turn constrain the physics of the universe's evolution. 

Astronomers have substantial literature on the problem of estimating fil- 
aments. Early efforts involved the use of standard spatial techniques, includ- 
ing high-order correlations, shape statistics from Luo and Vishniac (1995) 
and Minkowski functionals. [See the book by Martinez and Saar (2002) for 
an overview of such approaches.] Barrow, Bhavsar and Sonoda (1985) ap- 
plied minimal spanning trees to the problem, but these trees give primarily 
local descriptors of neighorhood relations. The skeleton method [Stoica et al. 
(2005), Eriksen et al. (2004), Novikov, Colombi and Dore (2006) and Sousbie 
et al. (2006)] involves smoothing the point distribution to estimate the skele- 
ton: a network connecting saddlepoints to local maxima of the point density 
with edges parallel to the gradient of the density. Stoica et al. (2005) de- 
velops an automated method for tracing a filamentary network based on 
marked point processes; Stoica, Martinez and Saar (2007) extends this to 
three-dimensional networks. 

In the statistics literature, filaments are similar to, but distinct from, 
principal curves [Hastie and Stuetzle (1989)]. Hastie and Stuetzle (1989) 
define a principal curve c by the self-consistency relation c{t) = E(X|7r c (X) = 
t) where tt c {x) is the projection of x onto c. Modified definitions are given by 
Kegl et al. (2000) and Sandilya and Kulkarni (2002). They call c a principal 
curve if c minimizes E\\X - vr c (X)|| 2 subject to c lying in a prespecified set 
of curves such as all curves of length bounded by some constant or with 
bounded curvature. In either case, they show that there exist estimators of 
a filament with convergence rate 0(n~ 1 ^). However, as noted by Hastie and 
Stuetzle (1989), the principal curve c is not equal to the filament. 
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Recently, Arias-Castro, Donoho and Huo (2006), considered the prob- 
lem of finding a single filament in a noisy background. They obtain precise 
minimax bounds on how sparse a filament can be and still be detectable. 
However, their method does not easily extend to the case where there are 
many filaments. Also, they can only detect the presence of a filament but 
they do not produce an estimate of the filament itself. 

4.1. A model for filaments. We assume that the data are a realization 
of an inhomogeneous Poisson process on Uo such that conditional on the 
number of observations, n, we have n i.i.d. draws X%, . . . , X n from the density 
gx on Uo. We model gx as a mixture of three types of components: filaments, 
clusters and the background. 

A filament is a smooth, nonself-intersecting curve / : [a, b] — > Uo- We con- 
sider a finite collection of C 2 filaments f\ , . . . , f mp of lengths £\, . . . , l mF , 
which are allowed to be positioned arbitrarily within Uq, including the 
possibility that distinct filaments intersect. Because our inferences are in- 
dependent of how the curve is parameterized, we parameterize the curve 
by arclength, taking a = and b = £(f), the length of / in ]R 2 . Define 
T = Uj=i ran § e (/i) to be the filament set. Clusters are concentrations of 
points around a cluster center z. We consider a finite collection of clusters 
which we denote C = {z mF +i, ■ ■ ■ , z m }. The background generates points that 
are not on filaments or in clusters, and we model it as a homogeneous Poisson 
process. 

Specifically, gx takes the following form: 



where C denotes the Lebesgue measure and each i = 1, . . . ,mp, is a 
strictly positive probability density on [0, <7i, . . . , a m > 0, and < a\, . . . , 
a m < 1 with J2i a i = 1- Recall that (p a denotes the symmetric normal dis- 
tribution over R 2 . We define \ix to be the probability measure on M? with 
density gx- Generally we take all Oi equal to a common value a. 

Our model for the filament data is a generalization of the model in Tib- 
shirani (1992): select a random point along the filament and generate from a 
normal centered on that point. Note that clusters are zero-length filaments, 
with the corresponding uii equal to a delta function at 0. 

When «o = 0, we say that gx is background free. In this case, gx is a C°° 
function on all of M 2 . Our results from Section 2 apply to the background- 
free case, but they extend directly to the case where «o > because Vgx is a 




i=l 



(12) 



rn 



+ atnp Ux (x-Zi) 



i=rriF+l 
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scaled version of the background- free gradient on Uq. We can either restrict 
the global flow to Uq or adapt it to Uq by constructing a C°° function uj with 
compact support that equals 1 on Uq and is zero outside a small dilation of 
Uq. Our results carry over immediately when M. 2 is replaced by S 2 , the two 
sphere, because S 2 is a compact, two-dimensional manifold. 

Under model (12), we can without loss of generality take Uq large enough 
to contain all the critical points. This is true because, in the background-free 
case, solving Vgx(x) = gives 



(m. F g. m 

^2 a i fi(s)wi(s)ip ai (x - fi(s))ds + Zianp(ai(x - Zi)) 

i=l J ° i=m F +l / 

/rn F t . m \ -1 

x \y2 a i J o Wi(s)tp ai (x - fi(s))ds + ^2 a>i(p(<Ji(x - Zi)) j . 



(13) 



\i=l i=mp+l 



where the terms come from (12), and so x must lie in the convex hull of 
J- U C, which is compact. (A background does not change the gradient on 
Uq, leading to the same conclusion.) By Assumption 2, all critical points of 
gx are isolated, and therefore M. and H and the set of local minima are all 
finite. 

Note that under our filament model, gx does not have compact sup- 
port. However, because the filaments are contained in a compact set, gx 
has normal tails in all directions outside large enough balls. In particular, 
|| |jvg^|| — n || ^ uniformly as the ball radius increases, where n is the 
outward pointing normal to the ball's boundary. This ensures that any con- 
tribution to the path density from points outside Uq is negligible. 

4.2. Capturing filaments. Now we show that the flow lines concentrate 
near the filaments. We do that by showing that the sets on which p exceeds 
some value A lies in a set that is close to the filaments and getting closer as 
A grows. Two complications are the local maxima, at which p{x) = oo and 
thus exceeds any A, and saddle points at which p{x) is roughly four times 
the value of p in a neighborhood around x. Define a = maxj Oi and 



(14) d(A) =CTA/21og(l/27rc7 2 A). 



Theorem 4. Let A = J-L)C C Uq, the set of points that are on a filament 
or at a cluster center. Let 7i denote the set of saddle points of gx ■ Define 



(15) 



£x = {xeU :p(x)> A}. 
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Then, for all e > and all A > e, 

(16) £ x n M c C B(A, d(A) +c)u(Hn B(A d(4A) + e)), 

(17) £ x nM c nH c cB(A,d(\) + e). 

Theorems 1 and 4 together imply that {x : A < p(x) < oo} C -B(„4, d(4A) + 
e) and that {x:X< p(x) <oo}nH c C B(A, d(A) + e). 

Corollary 2. T/iere exists A a such that A a CA, 

AvCSxCBiA^diX)) 
and du (A a , .4.) — > as a — > . Hence, for every £ > i/iere exists A smc/i t/iai 

d ff (A^)<C + o(^)- 

This corollary says that, as long as the noise level <r is small, the level 
sets approximate the filaments. 

We make use of results in Cuevas and Fraiman (1997). A set S is standard 
if for every A > there exists < 5 < 1 such that 

£(B(x,e)nS) > 5C{B(x,e)) 

for every x G 5, < e < A. 

Theorem 5. Suppose that K satisfies conditions (K1)-(K3), and that 
£\ is standard. Let T = {x : p(x) > A + c n } where c n > and — * 0. Suppose 
that (3 n — > oo, /3 n z/„ — > oo, and f3 n u n /c n is bounded. Then, 

(3 n d(£\, s x ) a Ao. 

Hence, for every £ > 0, t/iere exists A > suc/i t/iat di^^^,^) =Op(<r + ^). 

Remark 5. The theorem shows that the level sets approximate A a 
which is itself and approximation to A. We know that A a is close to A but 
a more precise statement would require specific assumptions on the shape 
of the filaments. 

4.3. Examples. In this section we briefly consider an example based on 
galaxy data and a synthetic example. 

Example 1. For the simulated example, the vertices of a pentagon have 
been randomly selected over [0, l] 2 . On each side of the pentagon, points 
were drawn according to a Beta(^,^) distribution. The total of n = 500 
points is divided among the sides proportionally to the side lengths. The 
resulting data set, shown in Figure 2A, was obtained adding a bivariate 
normal perturbation (with a = 0.03) to the points. 



ON THE PATH DENSITY OF A GRADIENT FIELD 



15 



Figure 2B and C shows that all the steepest ascent paths move toward 
the perimeter of the pentagon and then along it until they reach a mode. 
The level set at the 90th percentile of the estimated path density seems to 
be a good approximation to the pentagon perimeter (see Figure 2D). 

To see the effect of noisy background, 500 uniform points in [0, l] 2 were 
added to the data set. The whole procedure was implemented again on 
the augmented data set. Figure 8 shows that the presence of background 
introduces some noise, but the procedure still catches the filaments. 

Example 2. The filament estimation procedure has been tested also 
on cosmological data. The data set analyzed is part of the mini "SDSS 
ugriz" catalogue, produced by Croton et al. (2006) and publicly available 
online at www.mpa-garching.mpg.de/galform/agnpaper/. Along with other 
information, the catalogue gives the position of all galaxies in the cubic box 
[0, 62.5 Mpc/h] 3 . To deal with a two-dimensional data set, we selected the 
galaxies whose z-coordinate is in the interval [20,25] Mpc/h and considered 
only the first two coordinates. The resulting data set, shown in Figure 1A, 
contains the (x, y)-coordinates of n = 2435 galaxies. Plots B, C and D in 
Figure 1 show a reasonably satisfactory behavior of the filament detection 
procedure. 

5. Discussion. Our results are a first step toward developing methods 
for finding filaments without imposing strong assumptions about those fila- 
ments. However, much work needs to be done to turn the theory into prac- 
tical methodology. 

First, we need good, data-driven methods for choosing the bandwidths h n 
and v n . We conjecture that cross-validation methods could be quite effective. 

Second, extracting the filaments from the path density estimator p n is 
obviously very important. To find high local concentrations of paths, one can 
trim the observed paths or choose high level sets, as we did in the examples. 
These approaches, and possibly others, deserve careful examination. 

Third, we conjecture that these methods will generalize to three dimen- 
sions. The main challenge in doing so lies in the characterization of the path 
density, which requires handling an additional class of singular points and 
generalizing part of the proof from using closed curves to using closed sur- 
faces. Both steps are straightforward but tedious. Generalization to higher 
dimensions remains an open but interesting problem; related problems in 
computational topology have proven difficult in high dimensions in some 
cases. 

Finally, we think that the mathematical methods we have developed could 
be useful in other problems as well. 
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6. Proofs for Section 2. We begin with an explicit characterization of 
the path density p that includes the statement of Theorem 1. Throughout, 
we let U = M? and let Uq CU he a large compact set that contains all the 
critical points of gx m its interior. 

Theorem 6. Under Assumptions 1-3, the path density p is an upper- 
semicontinuous function with the following properties: 

1. If x is a local minimum of gx, p(x) = 0. 

2. If x is a local maximum of gx, p(x) = oo. 

3. If x is a saddlepoint of gx, then < p(x) < oo and there exist four se- 
quences of regular points (x J k ), for j = 1, ... ,4, all converging to x such 
that 

4 

(18) p(x) = lim^2p(xi). 

n ~ >00 j=i 

4. If x is a regular point of gx, then the following hold: 
(a) 

(19) P ( X ) = ^ 9BM) . 

i — >o r 

(b) For all sufficiently small r > 0, there exists a curve 7 of and a r < < 
P r such that (i) length(7[a r , 0]) = r + 0(r 2 ), (ii) length^fO, /3 r ]) = 
r + 0(r 2 ) and (iii) 

(20) p(x) = lim Z^MKlM) . 

r— »o r 

(c) There exists a smooth function h : IA x K — ► [0, 00) such that for any 
regular point x of gx 

POO 

(21) p(ip~ t x)= / g x (4>~ t ~ s x)h(x,t + s)ds. 

Jo 

5. p is continuous on (M L)7i) c and bounded on Ai c . 

To prove Theorem 6 we need two lemmas whose proofs are reported at 
the end of this section. Let D 2 gx{x) denote the Hessian matrix of gx at x. 

Lemma 1. For x E Uq, let r^ > be small enough so that B(x,tq) con- 
tains no critical points of gx, other than possibly x itself. Then, for r < ro, 
there exists a constant C > such that for y G B(x,r) 

\gx{y) - gx{x) - ^gx{x) -(y-x)-\(y- x) T D 2 g x (x)(y -x)\< Cr 3 , 

\\Vg x (y) - Vg x (x) - D 2 g x (x)(y - x)\\ < Cr 2 , 

where C is independent of r and y but possibly dependent on ro . 
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Lemma 2. Let 7 : [— 1, 1] — > Uq be a nonintersecting curve such that 7Q— t, t]) 
has length £ t and such that *y(t) is a regular point of gx for every — 1 < t < 1. 
Then, there exists a < t < 1 suc/i i/iat, /or all < s <t, 

(22) £(V( 7 ([-s,*])))<C4 + 0(0, 

where C > is independent of s but possibly dependent on x and t. 

Proof of Theorem 6. 1. The idea of the proof is to show that in 
small enough balls around x, the integral curves closely approximate those 
from a quadratic function around x. If gx were quadratic at x, then the 
flows carry every point in B(x,r) radially toward x. Hence, ir(B(x,r)) < 
fix{B(x, r)) = 0(r 2 ). The details that follow require some careful arguments 
because we make minimal assumptions about gx other than smoothness. 

We begin by constructing two small open balls Wo C V around x with 
four properties: (i) V contains no critical points of gx other than x, (ii) the 
Laplacian V 2 gx > on V, (iii) gx is locally quadratic on V in the sense 
that gx(y) = 9x{x) + h\(y) + h 2 (y), for a smooth function h, and (iv) the 
reverse evolution of Wo remains within V, V(Wq) C V. 

Note first that ip l x = x for all tGl. Because the Laplacian V 2 gx is con- 
tinuous and positive at local minimum x, there is an open ball V\ around x 
on which S/ 2 gx > 0. By Morse's theorem [Milnor (1963)], there is a neigh- 
borhood V2 of x and a diffeomorphism h from V2 to an open ball A 
around in M. 2 such that h(x) = and for y £ V2, 

(23) gx{y)=9x(x) + h 2 {y) + h 2 (y). 

Note V2 satisfies (i). Let V be an open ball around x contained in V± D V2. 
Then V satisifies properties (i), (ii) and (iii) above. 

To find Wo satisfying (iv) we need to get close enough so that the quadratic 
behavior dominates any wandering of the integral curves outside of V. Define 
q:A— > M by q(z) = \\h~ 1 (z) — x\\ 2 . This is a C"» function with q(0) =0 and 
hence Lipschitz. Thus, there exists an open ball A\ C A such that q(z) < c\\z\\ 
on Ai, for some constant c. Hence, there is an open ball Wo C h~ 1 {A\) 
around x such that V(Wo) C V. 

Next we show that for any open ball W C Wq around x, its reverse evolu- 
tion must stay within its closure, V(VF) C W. This is equivalent to showing 
that integral curves from within W cannot cross the boundary dW. To do 
this, we will use the following result. Let C C V be any region bounded by 
a piecewise smooth closed curve. By Stokes's theorem and by property (ii) 
of V above, 



(24) 



/ (-Vg x )-n= f (-VV)<0, 

JdC JC 
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where n is an outward pointing, unit vector normal to dC. (The application 
of Stokes's theorem comes by defining the differential form uj = —v% dx + 
v\ dy, where v = — Vg y . Then, du = —V 2 gx dx A dy and / c w = J v • n.) 

Apply (24) with C = W . If the integrand on the left, (— Vgx) ■ n, is ever 
positive on dW, it must, by continuity, be positive on some arc (01,02) C 
dW. Suppose then that the integrand on the left is nonnegative on such 
an arc (01,02). We will construct a closed curve bounding a region C C V 
to which we will apply (24). Construct the closed curve by jointing the 
following three curves, in order: a curve from x to ai obtained by t £ [0, 1] 1— > 
hT 1 (th(ai)), the arc (01,02) and the curve from 02 to x obtained by t G 
[0, 1] 1 — ► o _1 ((l — t)h(a,2)). Note that the first and last of these curves have 
two important properties: (i) their image is contained in V, and (ii) they 
follow integral curves of tp in one direction or the other. The latter is implied 
directly by (23). As a result, the integrand on the left-hand side of (24) is 
zero over the first and last segments and nonnegative over the middle one, 
which is a contradiction equation (24). 

As a result, for sufficiently small r > 0, an open ball B(x,r) satisfies 
V(B(x,r)) C B(x,r), and thus 

(25) P (x) = lim < lim °& = 0. 

r^o r 1 — >0 r 

2. We can apply the same argument as in item 1 except with Morse's 
theorem giving the representation 

(26) g x (y)=gx(x)-h\(y)-hl{y) 

and thus reversing the sign of the Laplacian. This shows that for sufficiently 
small ro > 0, an open ball B(x,ro) satisfies V(B(x,ro)) D B(x,ro). Moreover, 
this implies that for any r < r , V(B(x,r)) D B(x,r ). Hence, 

(27) P (x) = lim r » > lim ^Cg^g)) = ^ 

r^O r r^O r 

3. Note again that ij^x = x for all t G M and again that Morse's theorem 
implies the existence of a neighborhood U of x and a diffeomorphism h from 
U to an open ball A around in M 2 such that h(x) = and for y & U, 

(28) g x (y)=gx(x)-hl(y) + hl{y). 

Define £ as the global flow generated by the vector field V 1 - = KVgx, where 
R is a fixed 90-degree rotation matrix. 

Choose r > such that B(x, r) C U and B(x, r) contains no critical points 
other than x. For every < t < r, let St = h(dB(x,t)). This is a smooth 
closed curve around in A. In the h coordinate system in A, the integral 
curves of the vector field are hyperbolas of the form h\hi = c for some 
constant c, with the reversed (negative gradient) flow traveling from large 
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| 1 and small \h\\ to small |/i2| and large \hi\. (Here we are treating h\ 
and /12 as the coordinates under the change of variables induced by the 
diffeomorphism h.) Specifically, in A, the flow through a point (foi, /12) takes 
the parametric form s i— > (h\e 2s , /i2e _2s ). It follows that there is at least 
one point in each quadrant where the product I/11/12I attains its maximum 
on St, and that the set of such points in each quadrant is a closed set 
(by continuity). It follows that if there is more than one such point, they 
must all be on the same contour and that therefore, because the set is 
compact, there must be a unique point that succeeds the others under the 
partial order y. Let Z{(t) E St, for i = 1, . . . ,4, denote these points, and let 
yi(t) = h(zi) £ dB(x,t). In addition, let z + (t) and z_(t) denote the points at 
which St intersects the positive and negative h\ axis, respectively, and let 
y+(t) = h(z + (t)) and y~(t) = h(z-(t)). Here, the subscripts 1, ... ,4 indicate 
the associated quadrant and the subscripts + and — indicate association 
with the positive and negative hi axis, respectively. Take z(0) = and y(0) = 
x in each case. 

The six curves in the original space y±,y2,y3,y4,y-,y+ are smooth and 
parameterized by arclength. Moreover, since the integral curves are tangent 
at each of the points yi(r), for i = 1, ... ,4, each curve yi traces an integral 
curve of the flow £. 

From these, we construct four closed curves, 71,72,73 and 74, one per 
quadrant, as follows. The first travels along y\ from x to yi(r), then along 
the arc of dB(x,r) to y+(r), then along y + from y+{r) back to x. The rest 
are analogous, traveling x — ► y+(r) — > 2/2(0 — > x — > y±{r) — ► y_(r) — > x and 
x — > y~(r) — ► 2/3 (r) — ► x. We choose the time index along the circular arcs so 
that the j^s are parameterized by arclength. Note that the range of 7, is 
restricted to quadrant i by construction. 

Let C denote the region (including the boundary) enclosed by 7 curves. 
This consists of two "lobes," one joining the region enclosed by the two 7 
curves that intersect along the range of j/4. and the other joining the regions 
enclosed by the two 7 curves that intersect along the range of y_. The two 
lobes intersect only at x. Partition B(x,r) = Z + L + R, where Z = B{x, r) D 
/i -1 ({(u,0) G A:u^0}) is the image of the h\ axis; L = B(x, r) n (C — Z) 
is the union of the two lobes minus the axis; and R = B(x,r) — L — Z is 
everything else. The construction of the y^s shows that these sets satisfy the 
following: (i) every point in R preceeds a point [which is neither x nor the 
i/j(r)'s] on one of the y\ curves; (ii) Z is the image of a curve (and thus a set 
of measure 0), so every point in Z precedes (under ^) one of the two points 
in {y + (r), y_(r)}; (iii) every point in L is succeded (under >z) by a point 
(which is neither x nor the y^s) on one yi piece of the corresponding curve. 
Note that the integral curves from each j/j piece do not cross the "axis" 
/i({(n,0) £A:u^0}). (These facts are easiest to see in the transformed 
space A, even though the curves need not be radial or arc-like in that space. 
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The ZiS are the latest points on the highest integral curves. Z is the image 
of the h\ axis which creates two integral curves.) 
It follows then that 

(29) V(B(x,r)) = {x} U V( 7 i((0,r])) U • • • U V( 74 ((0,r])) U R U V{Z), 

where this union is disjoint. Moreover, C(R) < 7rr 2 and C(V(Z)) = 0, so 
neither component contributes to p(x). Because the y^s follow the integral 
curves of £, for any < t < r, p(yi(t)) = lim^o ir(yi((t - 5,t + S)))/S by- 
Theorem 6 (4b) — proved below, independently of these results — and for dis- 
joint intervals along yi, the corresponding reverse evolutions are disjoint. For 
integer n > 0, let A = r/n, then, 

n 

7r(^(0,r))) = X>(y*((j-l)A,iA)) 

i=i 

(30) = ^,fa(Q-l)A,jA)) A 

i=i 



r 



p{yi(s))ds 

o 

as n —> oo. The convergence is justified by the dominated convergence the- 
orem as each of the functions f n (t) = 7r fa( t ~ A A f+A / 2 )) converges pointwise 
to the bounded function p(y(t)). It follows then that 

lim - — = lim - f p(yi(s)) ds = lim p(yi(r)). 



r~^0 r r— >0 r Jo r^O 

And therefore, 

4 

(31) p(x) = Vlimp(y i (r)) ) 

^ — ' r^O 

i=l 

which proves the claim. 

For the upper semicontinuity proof below, note that any sequence x n — > x 
is eventually all regular points, and each x n can lie in only one "quad- 
rant" (relative to A), so limsup n p(3; n ) is no greater than the maximum 
of the lim sup over any subsequence lying in one quadrant. Hence, p{x) > 
limsup n _ >00 p(x n ), so p is upper-semicontinuous at x. 

(4a) For small enough r, B(x,r) contains no critical points of gx- Par- 
tition dB(x,r) into two sets C- m and C out , where C out contains all points 
on dB(x,r), whose paths do not cross the interior of B(x,r). Every point 
in C; n = dB(x,r) — C out either precedes or succeeds a point in B(x,r). So, 
V(Ci n ) C V(B(x,r)) U dB(x,r). Because the flow is smooth with a bounded 
derivative on Uq and because dB(x,r) is compact, C ou t equals an at most 
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countable union of points and arcs on which the flow is tangent to the 
circle. Each such piece produces a single curve under V, so it follows that 
£(V(C ou t)) = 0, which in turn implies that 7r(C ou t) = 0. Because the symmet- 
ric difference ofV(B(x,r)) and V(dB(x,r)) is contained in B(x,r)UV(C out ), 
we have that tt(B(x, r)) = 7r(dB(x, r)) + 0(r 2 ) as r — > 0. 

(4b) For convenience, let u = Vgx '■ —> M 2 denote the gradient function. 
Let it = u(x) and let H be the Hessian of gx, with eigenvalues Ai > A2. Pick 
5 > small. Choose tq > such that: 

(i) B(x,ro) contains no critical points of gx', 
(") r o^f < 1/4; 
(hi) r ^f < 3/4; 

(iv) \\u(y) — u(x) — H(y — sc)|| < 6(\\y — rr||/ro) 2 via Lemma 1. 
Then, let r < vq. 

The plan of the proof is to construct the curve 7 explicitly in the case 
where u is linear and show that the perturbation caused by the addition 
of higher order terms causes only a small change to the curve, consistent 
with the statement of the theorem. The range of the curve will generate the 
same set under V as the open ball around x, up to an 0(r 2 ) term. Consider 
points along the circles around x where the gradient u is tangent to the circle. 
Connecting these points will cut all the integral curves within the ball. Note 
that because the vector field is curl free, it follows from Stokes' theorem 
that there must exist at least two tangent points. That is, because the line 
integral around S is zero, there must be a sign change of the tangent vectors, 
but this requires at least two zeros on the circle. [The application of Stokes' 
theorem comes by defining the differential form u = v 1 dx + v% dy, where 
v = Vg y . This form is closed, that is, du = (D\V2 — D2V1) dx Ady = 0, and 
so = J c u = J v(C(t ))-f(t).} 

Let Bq = B(x,ro), B = B(x,r) and S = dB. We begin by showing that 
when u is linear, it is tangent to S at exactly two isolated points for every 
r < ro . We show further that the component of u normal to S is small only 
in a small neighborhood of these tangent points, which will be used to show 
that such tangent points lie on two small arcs for general u. 

So, for the moment, assume that u(y) = u(x) + H(y — x) on Bq. Recall 
that on Bq. Because H is symmetric, there are two orthogonal, unit 

vectors v% and V2 and two real numbers Ai > A2 (possibly zero) such that 
H(PiVi + P2V2) = P1X1V1 + P2^2V2- For every y S B, y — x can be written as 
(3\Vi + P2V2 for Pi + 02 — 7-2 an d vice versa. Hence, we can write, with some 
abuse of notation, 



(32) 



u(Pi ,P 2 ) = u{0, 0) + Ai/3i«i + X2P2V2, 
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where it(/3i,/?2) = u{x + (5\V\ + ^2^2)- Let a\,ot2 be such that it(0, 0) 
u(x) = ai^i + a2V2- We have by assumption of regularity at x that ||q|| 



a i + Q i > 0- Then, any y G S corresponds to [3 = (rcos#,rsin(9) for some 
9 G [0,2vr), and 

(33) u(/3i,P2) = («i + Air cos 0)ui + (q 2 + A 2 rsin#)f;2. 

Because u is nonzero on Bq, we have that for y G 5, u(y) is tangent to 5 if 
u(Pi,f32) ■ (Pijfh) = 0, or more explicitly, 

= [(«i + Air cos^)fi + (02 + X2f sm9)v2] T (r cos(9)v\ + rsin^)-^) 

(34) 

= r (ai cos 9 + \\r cos 2 9) + r (02 sin 9 + A2r sin 2 0) , 
which implies, canceling r > 0, that 

(35) a\ cos9 + 012 sin# + J" ( — 17— cos 20 + — = 0. 



Because ||ii°|| = ||u(x)|| = ||a|| > 0, there exists rj G [0, 2n) such that cos 77 = 
ai/||a|| and sin 77 = 0:2 / 1 1 01 1 1 - Then, solutions of the above equation corre- 
spond to crossings of the purely imaginary axis of the following complex 
curve c : [0, 2tt) — ► C: 

(36) c{9) = + r^V^V 2 * + r^±^. 

2||a|| 2 1| Q£ || 

This is a generalized epicycloid with phase rj and offset r ^jj^|p along the 
real axis. Write c\ = Re(c), C2 = Im(c), w = r |tct|t > and v = r 2\\lp\\ • 
Following Brannen (2001) and treating (ci — v,C2) as a plane curve, we see 
that the curve has no cusps if the curvature never changes sign and no loops 
if the vector cross-product of the curves position and velocity never changes 
sign. The sign of the curvature equals the sign of 

(37) ^4 ~ c'{c 2 = 8w 2 + 1 + 6w cos(6> + 77) 
and the sign of the vector cross-product equals the sign of 

(38) (ci - v)c' 2 - c[c 2 = 2w 2 + 1 + 2wcos(9 + r/). 

Taking < w < 1 and using the fact that the cosine terms are between —1 
and 1, we see that the former keeps the same sign if w < 1/4 or w > 1/2, 
and the latter keeps the same sign if (2w — l)(w — 1) > 0, which requires 
w < 1/2. The curve is thus locally convex whenever w < 1/4. Moreover, the 
(ci — v ) 2 + c| > (1 — w) 2 , so the curve always lies outside a circle of radius 
1 — w around v. Hence, if0<w;< 1/4 and \v\ < 1 — w, the curve will intersect 
the imaginary axis exactly twice. By assumption, then, the original vector 
field thus has exactly two isolated points of tangency with each circle S. 
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By the implicit function theorem, the collection of these tangent points, 
together with x, form the image of a smooth curve, 7, with 7(0) = x and 
7'(0) nonzero. Every point along this curve is perpendicular to the gradient 
flow, so this curve is part of the integral curve through x for the flow £, so 
we can take j(t) of the form The angular separation between the two 
tangent points forming the branches of 7 can be determined by studying 
the curve c above. Direct calculation shows that the arc length varies with 
9 along the curve as 1 + 0(r 2 ) relative to a circle. It follows that the angle 
between the tangent points is n + 0(r), and thus the length of 7 between 
the center of the circle and each point on the circle at radius r has length 
r + 0(r 2 ). This proves the claim for the linear case. 

In the general case, an error term of order r 2 is added to linear term by 
the approximation Lemma 1. This perturbs the tangent points and can add 
or move them, though as shown above there are always at least two. (We get 
them by following £ forward and backward until we leave the circle, which 
we do eventually because there are no critical points in the neighborhood.) 
We show that these are confined to two small arcs on the circle of length 
0(r 4 ). To do this, note that the size of the outward normal component of 
u(y) is \u(y) ■ (y — x)/\\y — x\\\ = \u(y) ■ (y — x)\/r. The numerator is the left- 
hand side of (35). The normal component might be zero for any 6 G [0,27r) 
for which | Re(c(0)/r)| < Cr 2 , or equivalently | Re(c(0))| < Cr 3 . For a circle, 
this occurs over two arcs of angular size 0(r 3 ) and thus of length 0(r 4 ) 
in Uq. For the general epicycloid, the arc length over any segment can be 
at most 1 + 2w times that of the circle, which again gives an arc length of 
0(r 4 ). 

The 7 curve constructed above connects two tangent points, the remaining 
integral curves can cross the boundary in a curve of length 0(r 4 ) and hence 
by Lemma 2, 

(39) C(V(B(x,r))- V( 7 (K,A-]))) = 0(r 4 ), 

where a r and (3 r are the time indices at which each branch of 7 strikes the 
circle of radius r. The result follows. 

It is worth noting that we can construct a curve 7 for which V(B(x,r)) 
equals V applied to the image of the curve, except for part of B(x,r) itself. 
To do this, we include all the tangent points on the two arcs of length (3(r 4 ) 
and then connect opposite ends with a curve of minimal length through x. 
The resulting curve has length 2r + 0(r 2 ) and hits every equivalence class 
in B(x,r). 

(4c) By Theorem 6(4b) above, it is sufficient to work with V(7([a r , b r ])) for 
sufficiently small r > 0. This is generated by a curve of the form 7(5) = £ s x 
over the interval [a x (r),b x (r)]. Taking a x (0) = b x (Q) = 0, note that a x and 
b x are differentiable functions of r and continuous functions of x. Hence, we 
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have the set A = V(j([a x (r), b x (r)])). Construct a smooth bijection [0, oo) x 
[a(r),b(r)\ to A by z(t,s) = tp^^x. It follows by the change of variables 
and Fubini theorems that 



7T(V(7r([arA])) _ 1 

(40) 



g x (v) dv + 0{r 2 ) 

gx(ip~ t £ s x)J{x, t, s) dt ds + 0(r 2 



I A 

1 rKr) 

r J a (r) JO 

where J(x, t, s) is the Jacobian determinant of z, which is a smooth function 
in x. Taking limits as r — > yields 

roc 

(41) p(x) = (b' x (0) ~ a' x (0)) gx(tp- t ex)J(x,t,0)dt. 



o 

Taking h(x,t) = (£4(0) — a' x (0))J(x,t,0) gives the theorem for p(x). Now 
apply the above to ip^x for any t G R to get the formal statement. 

Boundedness of p on U — M follows from showing uniform boundedness at 
regular points of gx , by items 1 and 3 above. At a regular point x of gx and 
for sufficiently small r > 0, there is by Theorem 6(4b) above and Lemma 2 
a constant K > dependent only on gx such that 7r(7([a x (r), b x (r)])) < Ifr. 
It follows that p(x) < K as was to be proved. 

Finally, to prove that p is upper semi-continuous it is sufficient to show 
that p is continuous at regular points and local minima. The rest follows 
from items 1 and 2, and the proof of item 3 above. 

Suppose x is a local minimum of gx and pick ao > small. Then by the 
proof of item 1 above, there exists an a\ < ao such that a < a\ implies that 
V(B(x,a)) C B(x,a). It follows from Lemma 2 that if y £ B(x,a), then for 
small r < a/2, C(V(B(y,r))) < Cra, the width of the ball times the scale of 
the set B(x,a), where C > is a constant independent of a but dependent 
possibly on ao- If we choose a < ao/C, then ir(B(y,r))/r < ao for r < a/2, 
showing that p{y) < ao- It follows that p(y) — > as y — > x, which proves 
continuity of p at local minima. 

To prove continuity at regular points, we apply Theorem 6(4c) above. Be- 
cause gx and the h in Theorem 6 (4c) are continuous in x, they are bounded 
on Uq. The bounded convergence theorem thus allows an interchange of limit 
and integral. Taking t = and letting x n — > x all be regular points of gx, we 
thus have that 

roo 

(42) p(x) = lim / g x (ip~ s x n )h(x n ,s)ds = lim p(x n ). 

n— >oo Jq n— »oo 

This proves that p is upper semi-continuous. This completes the proof. □ 
PROOF of Theorem 2. 1. By Theorem 6 (4b) and (c), we can write 

(43) 7r(B(x,r))= / / H(£ s x,t) dtds + 0(r 2 ), 
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where by the proof of Theorem 6 (4c), if is a smooth function and (3 r — 
a r = r + 0(r 2 ). If D\H(y,t) denotes the derivative of H with respect to 
its first argument, then Taylor's theorem gives us that H(y,t) = H(x,t) + 
D\H(u{t,y),t) ■ (y — x) for some points u(t,y) on the line between y and 
x. Because H(y,t) and H{x,t) are integrable with respect to t, and y is 
arbitrary in U, it follows that \\DiH(u(t,y),t)\\ is integrable as well. The 
integrals are bounded over Uq by compactness. Hence, for s £ [a r ,f3 r ], 



(44) 



roo roc 

/ H(£ s x,t)dt- / H(x,t)dt 
Jo Jo 



<Cr 



for a constant C > independent of x. The second term in the above absolute 
value is just p(x), so 

(45) ir(B(x,r))= / (p{x) + 0{r)) ds + 0{r 2 ) = rp(x) + 0(r 2 ), 

J a r 

which proves the claim. 

2. Let 7 be the curve in Theorem 6(4b) centered at x and parameterized by 
arc-length over the interval [a, b], where a < < b, \a\ = r + 0(r 2 ), and \b\ = 
r + 0(r 2 ). For positive integer m, pick index points = a + (b — a)k/m, 
for < k < m and cover "f([a, b\) by open balls of radius (b — a) /2m centered 
at each 7(i m fc)- It follows that 

(46) 7T( 7 ([a,6])) = f>(i?(7(W.),^)) + 0(m~ 2 ) 

fc=o v v /7Tl// 

(47) «5>(i mfc )^ + 0(m- 2 ) 



fc=0 

1 



6 



(48) ^-J^ p{n (t)) dt 

(49) = I£p( 7 (t))dt + 0(r 2 ), 

where the limit is over m — > oo. The second line above follows by part 1. 
Because 7v(B(x,r)) = 7r( 7 ([a,&])) + 0(r 2 ), the result follows. □ 



Proof of Lemma 1. This result follows directly from Taylor's theo- 
rem and the compactness of Uq. For example, the remainder term in the 
gradient approximation (second equation in the theorem) can be written 
r 2 (u T Ai(y, x)u, u T A2(y, x)u) T , where u = (y — x)/\\y — x\\ and where the 
Ai(y,x)jk = J (1 — t)d 3 gx(x + t(y — x))/ dxidxj dxkdt. Because the max- 
imum eigenvalue of these matrices is a continuous function of the matrix 
and thus continuous in y [Naulin and Pabst (1994)], each component of this 
vector is bounded. □ 
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Proof of Lemma 2. To begin, suppose 7 is a short segment of length 
r of a curve of the form 7(5) = £ s x, parameterized on the interval [a, b] with 
a < < b. Assume that every point along the curve is a regular point of gx ■ 
For sufficiently small r, the indices a < < b satisfy \b — a\ < nr because 
the derivative of Vgx is bounded above and below (in norm) in a small 
neighborhood of x. 

Every point 7(5) can be classified according to whether lim^oo -0~*7(s) 
is (i) a point on the boundary of Uq; (ii) a local minimum of gx', or (iii) 
a saddle point of gx- If there are no points of class (iii) on the segment, 
then we can take r small enough so that all points are of the same class. 
In general, under Assumption 2, there is at most a finite set of s S [a, b] of 
class (iii), each of whose V(7(s)) has Lebesgue measure 0. So it suffices to 
assume that there are no class (iii) points on [a, b] because without loss of 
generality, we could individually consider the finite open intervals in [a, b] 
on which this is the case. 

Let V denote the set V(7([a, b])) and let I = U a < s <()^7(s)i where I x is the 
intervals of time indices for which the flow from x stays in Uq. Define 



and note that V C V\. Define a mapping h : [0, £] x I — > V\ by h(s, t) = ip^^x. 
Note that if s\ 7^ S2, then the £ Si x are in different equivalence classes and so 
h(s\,t) 7^ h(s2,t) for any t. Similarly, if ti 7^ t2, then ?/> -ii £ s x lie at different 
points along the integral curve. Hence, h is one to one. If y G V±, then by 
construction, y is equivalent to a unique £ s x for s G [a, b] with respect to 
the flow ip. Among all points in V\ equivalent to y with respect to tp, each 
is obtained from the corresponding ^ s x by for a unique t (V> -i is non- 
singular with respect to t). Thus, h has a one-to-one inverse. Moreover, h is 
differentiable because it is the composition of differentiable functions. Thus, 
h is a smooth bijection. 

It follows then, by a change of variables and Fubini's theorem, that 



where J is the absolute Jacobian determinant, and C = k\I\ sup s t J{s,t) < 
00 by the compactness of [a, b] x /. 

Suppose now we take 7 to be a general curve as stipulated in the theorem, 
of length r > 0. Then, by taking r sufficiently small, Lemma 1 shows that the 
gradients of gx along 7 point in the same direction up to 0(r). We can thus 
find a segment of an integral curve of £ within 0{r) of 7 that cuts across 
all the equivalence classes that 7 hits. The Lebesgue measure of V applied 
to the latter curve is bounded by Cr, and the additional area between the 
curves is 0(r 2 ). 



Vx ={f'^:se [a,b],t el} CU 



(50) 
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Having proved the result for sufficiently small segments, a general curve 
can be decomposed into smaller segments for which the above arguments 
apply. The measure of the V-induced set is then bounded above by a sum 
of the upper bounds along the curve, which gives a bound of 0(r) as was to 
be proved. □ 

7. Proofs for Section 3. Through this section, sup x denotes the supre- 
mum over all regular points, hence, the rates of convergence resulting from 
Theorem 3 and Corollary 1 are uniform over the set of regular points of gx ■ 

To prove Theorem 3, we proceed as follows. Define 

Then, 

\p n (x) -p(x)\ < \p n (x) - p* n (x)\ + \p* n (x) -E\p* n (x)}\ + |E[p*(x)] -p(x)\. 
We bound these terms using the following three results. 

Theorem 7. Under the assumptions in Theorem 3, 

■2 



sup|p„(x) -p* n (x)\ =OU — — A ) +0(h n ) a.s. 



aog(i/frn 

Theorem 8. Under the assumptions in Theorem 3 



sup\p* n (x)-E[p* n (x)]\=0 P \ 



'logra 



nvr, 



Theorem 9. Under the assumptions in Theorem 3, 
sup |E[p* (x)] -p{x) \ =0(y n ). 

X 

Theorem 3 follows by combining these results. Before proving these the- 
orems, we need a few preliminary results. 

Lemma 3. // (K1)-(K3) hold and nh\j log (l/h n ) — > oo, then 



/log hn 1 

■) ~gx{X)\ =ui y 
If, in addition, nh\j log(l//t n ) ^oo, then 



sup\g n (x)- g x (x)\ =0\ \/ ^7 J +°( h i) a - s - 



sup \Vg n (x) - Vg x (x)\ = O (f^-) + 0(h 2 n ) a.s. 
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The first result above is Theorem 2.3 of Gine and Guillou (2002). The 
second result may be proved similarly; see also Gine and Koltchinskii (2006). 
Let 

D(x,y) = inf — x\\, D n (x,y) = inf \\s — x\\. 



Lemma 4. If nh^/\og(l/h n ) —> oo, th 



en 



x y 



supsup|L>(2;,?/) - D n (x,y)\ =0[ \\ ) +0{h 2 n ) a.s. 



Proof. Follows from compactness and Lemma 3. □ 

Proof of Theorem 7. Note that 
1 n 

p n {x) - p* n (x) = J2(K (D n (x, Xi)/u n ) - K(D(x, Xi)/u n )). 

Since K' is uniformly bounded, the result follows by Taylor expanding 
K(D n (x,Xi)/v n ) around D(x,Xi) and applying the previous lemma. □ 

Since K is nonincreasing, there exists a cumulative distribution function 
M with support [0,oo) such that 

K(x) = J™h [0>s] (x)dM(s). 

Note that K(x) = ^ dM(s) and that condition (K3) above implies that 
all moments of M are finite and that 1 — M(x) = 0(x 2 e~ x ). 
The path density estimator p* in (51) can be written as 

1 " 1 v ( D{x,Xj) \ 1-1 /—I fD{x^Q\ 

Pn(*) = ~X,- K (— =-L- / -Ms] — : dM ( S ) 



n ~[v n \ v n J n<—' l v n jQ s 

1 n roo 1 

E / -l[ ,s Vn ](D{x,Xi))dM{8). 
■ T Jo s 



nv n ~[Jo s 



To prove Theorem 8 we will use Talagrand's (1994) inequality. The version 
we use is from Gine and Guillou (2002). If V is a class of functions, let 
N(T,L2(P),6) be the smallest number of balls of radius 5 needed to cover 



r with respect to the metric J J (f(x) — g(x)) 2 dP(x), with f,g£T. 
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Theorem 10 (Gine and Guillou). Let V be a class of uniformly bounded 
functions such that there exist A > and d>l for which 



(52) 



sup N\T,L 2 (P),e J J G 2 (x)dP(xU < (jj 



where G is an envelope for V , and the supremum is over all probability 
measures. Let a 2 > sup 7gr Var(7(X)) and U > sup 7gr ||7||oo and < a < U . 
Then, there exist constants B,C and L such that the following is true. If 



e> — ( [/log 



n 



(v) 



+ y/na\ log 



(") 



then 



sup 

, 7 er 



^ 7 (X i )-E( 7 (X i )) 



n ^ . 



> e 



< Lexp 



ne 
LU 



log 1 + 



neU 



L(^Ea + U^log{AU/a)) 2 , 
Proof of Theorem 8. Let 7 XjS = l[ 0)S „ n ](.D(x, •)) and write p* n as 

1 n POO 1 

Pn<?) = E / - W](^M))dM( S ) 

r°° l l n 

= / J2i x , a (Xi)dM(8). 

Jo sv n n ^ 

Theorem 11 below shows that the class of functions 

T n = {"fx,s,x 6 U ,0 < s < logn} 

satisfies the covering condition (52), hence Talagrand's inequality holds with 
U = 1 and a 2 = 0(u n logn). Set Q n = yl~ogn. Then, 



P sup 



i=l 



> Qni 



'f n logn 



n 



< Lexp 



nQ n ^v n logn/n 



x log 1 + 



nQ n yJv n logn/n 



L{^Jnv n logn+ ^log{A/^u n logra)) 2 
so that — with probability tending to 1 — we have 

-J2^A x i)- E hxAx)} =c 

n \ V n 



o(l) 



sup 



1=1 



'f n logn 
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It follows that 
sup|p*(x)-E[p*(s)]| 



r°° l 

< sup / 

x JO SV n 
rlogn Y 



~5>*,«CXi)-Efo»,.(*)] 



i=i 



dM(s) 



< sup 

x JO SU n 



1 " 



n 



i=l 



dM(s) + 



1 



1 



l^n J logn S 



dM{s) 



< sup 



logn ^ 

sup 

SV n s g(o,logn) 



J2~? x>s (Xi)-E[~/ x , s (X)} 



n f . 



dM(s) 



1 



h — .fr(iogra) 



sup sup 

x se(0,logn) 

H if (logn) 



j27xA^)-nixAx)\ 



i=i 



i 

Vn JO 



logn 



dM(s) 



K(0) 
< —-^ sup 

v n 7a;,ser n 



-^^.(liJ-E^X)] 
n f— f 



i=l 



+ 



K(logn) 



Hence, 



sup\p* n (x)-E\p* n (x)]\ = ^-0 P \ 



f n logn 



7) 



1 ^/losr 



Op I 



logn 



□ 



Proof of Theorem 9. Note that D(x,Xi) < sv n if and only if Xi G 
V(-B(x, s/y n )), hence 

E[l [0 ,^](I>(a;,X))]=^(V(B( a r, S i/ B )))=7r(B(a;,fli/ n )). 

The expected value of p* is 



E[p*(x)]-p(x) 



— E 



■l [0>s , n] ( J D(x,X)) ( iM( S )-p(x) 



- r-M[l [0t3Vn] (D(x,X))]dM(s) -p(x) 
Vn Jo s 



-ir(B(x, sv n )) dM(s) - p(x) 



Vn JO S 
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1 f°° 1 

— / - [rc(B(x,si> n )) - sv n p(x) \dM{s) 

Vn JO S 

0(u n ). 



To verify that the last term is 0{u n ), note that the integrand is not greater 
than - + v n p(x) and, from Theorem 2, there exists r x such that for su n < r x 
the integrand is smaller than -Cs 2 u 2 , so that 



-n(B(x,sv n )) - su n p(x) 
s 



Also, r = inf T r T > 0. Hence 



Csvl 



< 



s < — , 
v n 

s> — . 



1 



1 



V n JO S 



tt(B(x, sv n )) - sv n p(x)} dM{s) 



rr/v n poo fi \ 

< / Csv n dM(s)+ / (-+p{x))dM(s) 



< Cu n J™ s dM(s) + (1 + p(x)j (l - M (-pj ^ , 

which is 0(v n ), due to the tail condition on M and the existence of its 
moments. Thus, 



sup\E\p* n (x)}-p(x)\=0(v). 



□ 



Theorem 11. Let k be the number of critical points of gx and letm<k 
be the number of modes. The VC dimension of the class of sets 

U = {V(B(x, su n )) : x G U , < s < log n) 

is less than 4m. Hence, the covering condition (52) holds. 

Proof. Let bi,...,b m denote the modes of gx and let £{x) = {tp (x), < 
t < oo}. Note that S(x) is a smooth curve starting at x and ending at some 
mode of gx- Partition Uq into sets U\, . . . , U m where x G Uj if ip°°(x) = bj. 
Let F be a finite set containing 4m points. Then there exists at least one Uj 
such that F fl Uj has at least 4 points. Let G C F (~)Uj be a subset of size 4. 
So G7 = {xi, X2, £3, £4}, say. We will show that G cannot be shattered. 

Step 1. If Xfc G — {bj} for x& / (where Xk,X£ G G) then clearly G 
cannot be shattered. Thus we can assume that E^V\Ei = {bj} for each k^£. 

Step 2. Let R = V(B(z,p)) G 11. We claim that R picks out AcG if and 
only if 

(53) B(z, p) n 8{xj) 7^ for each G A 
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Fig. 5. The set R — V(B(z, p)) picks out {xi,X2} (above) if and only if the forward 
evolutions £(xi) and £(x2) hit the ball B(z,p) (below). 

and 

(54) B(z, p) n £(xj) = for each Xj £G - A. 

This follows since x € V(B(z, p)) if and only if B(x,p) D £(x) ^ 0. See Fig- 
ure 5. 

Step 3. Place a small ball around bj and renumber the points in G by the 
angle of the vector from z to the intersection of £(xj) with the boundary of 
the ball. See Figure 6. 

Step 4. We claim that if there exists an R that picks out A = {xi,xs\ 
then there does not exist an R' that picks out {x2,x^}. Suppose R = B(z,p) 
picks out A. Because of (53) and (54), both £{x\) and £ (x%) pass through 
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B(z,p) but neither £(£2) and £(#4) pass through B(z,p). See Figure 7. 
Choose y\ E £(x±) n B(z,p) and 2/2 G £(£3) H B(z,p). Define a closed curve 
C as follows. C begins at the mode bj, follows £{x\) to y\, connects y\ to y^ 
by any smooth curve contained in B{z,p) and the follows £{x2) back to bj. 
Now C encloses £(^2) and excludes £(£4). Hence, there is no ball B(z',p) 
satisfying (53) and (54) for A' = {x2,X4}. Thus, there does not exist an R' 
that picks out {x2,Xi}. We conclude that G cannot be shattered. □ 

8. Proofs for Section 4. 

Proof of Theorem 4. We prove the theorem in the background- free 
case. The result follows in the general case because the gradient of gx on U$ 
changes by a constant factor, yielding the same function p(x). Let D{x,r) 
denote the set V(7([a r , (3 r ])) m Theorem 6 (4b) within the ball of radius r 
around x. 

Consider the following intermediate claims: 

1. x<y implies gx(x) < gx(y)- 

Let 7(4) = ^~ l y, t > 0. Then, -y'(t) = -Vgxdit)) and there exists t\ > 
such that 7(ti) = x. Let v(t) = gx("f(t)). Then, v(0) = gx(y), v(t) = 
V 5 x(7W)-7'(*) = -]|V 5 x(7(*))l| 2 < 0. Hence, v(t) = v(0)+f*v(t)dt < v(0), 
which proves the claim. 

2. gxjx) < f a (d(x,A)). 

Let x denote the point in A that is closest to x. Then, gx(x) is no greater 
than g(x), where g is the normal mixture with all its mass at x. Thus, 
g(x) = ip a (x — x) = ip a (d(x,A)), proving the claim. 

3. If x E B(A,d(\) + e) c and r < e, every y E V(B(x,r)) and every y E 
V(D(x,r)) satisfies 

(55) g x (y)<ip a (d(X)). 
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Fig. 7. If V(B(z, p)) picks out {11,2:3} then £(x\) and £(12) hit the ball B(z,p) (top). 
We can then join the two curves £(xi) and £(x2) forming a closed curve C that isolates 
X2 from X4- Note that all £(xj) must end at the mode (the large dot) since these points 
are all in the same element of the partition Uj . 

Every element of B(x, r) is farther than d{\) from A, where d(\) is defined 
in (14), so the first claim follows by 1 and 2. By the construction in the proof 
of Theorem 6, D(x,r) C B(x,r), so the second claim follows as well. 

4. For regular points x of gx, C(V(D(x,r))) < Cr + 0(r 2 ), for a constant 
that depends only on Uq. 

The claim follows pointwise directly from Lemma 2. The proof of that 
lemma shows that because Uq is compact, the constants for each point are 
bounded. 

Now we prove the main result. If x S A4, the local maxima of gx, then 
p(x) = 00 by Theorem 6. If x E H, then p(x) < Asup y p(y), where the supre- 
mum is over any sufficiently small ball around x. Hence, if x E B(A, d(4A) + 
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C 



Fig. 8. Simulated example with background noise: data set (plot A), steepest ascent paths 
of all data points (plot B), paths after cutting the first seven iterations (plot C) and level 
set at 90% quantile of the estimated path density (plot Y>). 

e)°, the regular points in a small enough ball around x will lie in B(A, d(4A) + 
ef. 

If x G B(A,d(X) + e) c is a regular point of gx, then for small r > 0, 
C{V{D{x,r))) < Cr + 0(r 2 ), for a constant C independent of x. Hence, 

(56) vx(V(D(x,r))) < CrM^W) + 0(r 2 ). 

Thus p(x) < C(j) C7 {d{\)) = A, and the result follows. □ 



Proof of Theorem 5. The theorem follows directly from the previous 
results and from Theorem 3 of Cuevas and Fraiman (1997). □ 
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